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This paper describes a new algorithm for Monte Carlo integration, based on the 

Field Estimator for Arbitrary Spaces (FiEstAS). The algorithm is discussed in 

detail, and its performance is evaluated in the context of Bayesian analysis, with 

emphasis on multimodal distributions with strong parameter degeneracies. Source 

code is available upon request. 
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1 Introduction 



Many problems in Physics, as well as in other branches of science, involve the 
computation of an integral 

/ = / /(x) dx (1) 

Jv 

over a given multidimensional domain V. In most circumstances, the value of 
/ has to be estimated numerically, evaluating the function / at a series of TV 
sample points Xj. 

Numerical quadrature algorithms (see e.g. pQ) sample / on a regular grid or 
at a carefully selected set of unequally-spaced points (Gaussian quadrature). 
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This approach is very efficient for smooth functions, but it is certainly far from 
optimal when the integrand is strongly peaked. Adaptive techniques provide 
a solution to this problem by refining those sub-intervals of the integration 
domain where / is found to vary on smaller scales. 

In multiple dimensions, Monte Carlo methods usually yield better accuracy 
for a given N than repeated integrations using one- dimensional deterministic 
algorithms. The essence of the Monte Carlo method is that the sample points 
Xj are chosen (to some extent) at random. The most basic version uses a 
uniform probability distribution over the integration region, and the estimator 
for the integral I is given by 

V N 

^MC = "77 Yl fi ( 2 ) 

where V denotes the integration volume and f+ = /(xj). The error associated 
to Jmc can be estimated as 




^c = ^E(/«-^l (■>» 



Although the error decreases as iV -1 / 2 (this is, in fact, why Monte Carlo 
methods are preferable in multidimensional problems), the basic Monte Carlo 
is still computationally inefficient when the function / has a very narrow peak 
compared to the total integration volume. On the one hand, a large number 
of sample points would be required to obtain good estimates Imc an d A MC . 
On the other hand, and more important, it is indeed quite possible that the 
algorithm achieves convergence, reporting a wrong value of the integral with 
an extremely small estimate of the error, before finding the maximum (or 
maxima) of /. 

A refinement of this method, known as importance sampling, consists in draw- 
ing the points from a distribution similar in form to the integrand, so that they 
are more likely to come from the regions that contribute most to I. Rather 
than sampling the whole integration volume uniformly, one chooses a prob- 
ability distribution g(x) that adapts to the shape of the integrand. For any 
g(x), the integral flTJ can be re- written as 

i= I f -¥\ *( x ) dx ( 4 ) 

Jv g(x) 
and the estimators of I and its error become 



1 N f- 

fis = ^E- (5) 

N U.9i 



,2 ' \"" / Ji 



and 

A «^m-'' s ) (6) 

respectively. The key issue is, of course, the choice of the distribution g(x). 
These two expressions reduce to ([2]) and ([3]) for the particular case of uniform 
sampling, i.e. all g>j = g'(xj) = 1/V. In the ideal case, when g(x) = |/(x)|/J, 
all points contribute exactly the same amount to the sum in ([5]) and zero 
in (EJ), and the algorithm takes only a few iV to achieve the desired accuracy. 
Unfortunately, choosing g(x) is not straightforward when the shape of / is 
unknown a priori, and evaluating it is actually part of our goal. 

A perfect example of such a situation is Bayesian inference, where one tries 
to select the model M that best describes the data D and estimate the values 
of the model parameters 0. For a given model, Bayes' Theorem states that 

where p(@\M) is the prior probability distribution of the model parameters, 
p(D\M, 0) is the likelihood of the data, p(&\M, D) is the posterior probability 
distribution of the parameters, and p(D\M) is the evidence for model M, 



p(D\M) = fp(@\M) p(D\M, 0) d© 



The posterior probability distribution (j7j) contains all the information needed 
for parameter estimation, and the evidence plays in this case the role of a mere 
normalization constant. Nevertheless, the value of the integral (jHJ) becomes 
critical in model selection. Applying again Bayes' Theorem, two models Mi 
and M 2 can be compared by evaluating the Bayes' factor, 

p{Mi\D) = g(Mi) P(D|M!) 

p(M 2 \D) p{M 2 ) p(D|M 2 ) l ' 

where p(Mi) denotes the prior probability of each model, p(D|M$) are their 
evidences, and p(Mj|D) are the posterior probabilities. Even if one is only 
interested in parameter estimation, it is good practice to quote the evidence 
(as well as the assumed priors) in order to make possible the comparison with 
future work. 

The computational part of a Bayesian analysis consists thus in the evaluation 
of the integral / = p(D\M) = J f(&) d0, where /(0) = p(&\M) p(D\M, 0), 
over the region of the parameter space where p(®\M) > 0. Since each eval- 
uation of the likelihood may involve a significant computational cost, the al- 
gorithm should locate the the maxima of /(0) as efficiently as possible, even 
when this function has a complicated structure. In particular, it is not un- 
common that /(0) presents several maxima, corresponding to different sets 



of parameter values that provide a good fit to the data. Which solution is 
to be preferred depends, within the Bayesian framework, on the value of the 
integral (the evidence) over the region associated to each peak. Even more 
often, there are strong degeneracies in parameter space. The maxima of the 
function / can be extremely elongated, generally with a curved shape arising 
from a non-linear relationship between two or more parameters. 

These two problems (multimodality and curved degeneracies) have proven dif- 
ficult to overcome for many standard algorithms, especially as the number of 
dimensions increases. Markov Chain Monte Carlo methods (see e.g. [2]) typ- 
ically require a relatively large number of evaluations. Nested sampling j3] 
provides a more efficient alternative, transforming the multidimensional inte- 
gral ([1]) into a single dimension by means of a re-parameterization in terms of 
the 'prior mass' p(&\M) d@. This formalism has recently been extended to 
cope with the problems of multimodality and curved degeneracies by resort- 
ing to clustering algorithms [Hf5] . On the other hand, traditional importance 
sampling methods (e.g. Vegas [6]) assume a separable weight function, i.e. 
<?( x ) — Ud=i 9d(xd), which can give a good approximation to /(x) only as long 
as the characteristic features are well aligned with the coordinate axes. An 
adaptive refinement technique has been implemented in the algorithm Suave 
[7] to improve the performance in the general case. 

Here I present a new method, based on the Field Estimator for Arbitrary 
Spaces (FiEstAS) developed by [8] to estimate the continuous density field 
underlying a given point distribution. The algorithm is fully described in Sec- 
tion [21 results are presented in Section [21 and the main conclusions are sum- 
marized in Section [U 



2 Description of the algorithm 



FiEstAS sampling performs multidimensional numerical integration by means 
of Monte Carlo importance sampling. Here I discuss in detail the most relevant 
algorithmic issues; actual source code is available upon request. 

The method has three free parameters: the desired relative accuracy e, the 
fraction of points that are uniformly distributed 7] u , and the fractional increase 
in the number of points per step rj n . Default values are e = 0.01, r] u = 0.1, and 
r] n = 0.1. For the first step, the number of new sample points is set to n — 1. 

The main loop of the program can be summarized as 

(1) generate r\ u n new points uniformly 

(2) compute g(x) for this step using FiEstAS 



(3) sample uf = (1 — T] u )n new points from g(x) 

(4) evaluate the integral and its error 

(5) increase n by a fraction r\ n 

(6) repeat until the desired accuracy is achieved 



2.1 Uniform sampling 



FiEstAS sampling is intended to adapt the probability distribution g(x) to 
the shape of the integrand within the fewest possible number of iterations. 
As will be shown below, the presence of pronounced curved degeneracies does 
not pose a significant problem to the algorithm, but the discovery of narrow, 
isolated maxima is, at best, a very hard problem, and a large number of 
sample points may be required in order to achieve the correct result. Even 
worse, there can be no guarantee that all the maxima of / have been located, 
and therefore it is important to reach a compromise between the detection of 
potential multimodality and sampling efficiency. 

Sampling from a uniform distribution is obviously the most unbiased way to 
explore the integration domain in search of previously unknown maxima. The 
fraction of points (i.e. computational effort) devoted to such exploration is 
controlled by the parameter < T) u < 1. A low value would be adequate for 
functions where multimodality is not a concern, either because the positions 
of the maxima are already known by other means, because there can only 
be one, or because the different peaks are not too isolated (i.e. the contrast 
between the maximum and the saddle point towards the nearest detected peak 
is not large). A high value of rj u would help the algorithm locate the peaks, but 
it would render it slow. What constitutes a reasonable compromise between 
accuracy and computational resources depends, of course, on the details of 
the application being considered, and the adopted default value should not 
preclude the user from applying his/her own judgment. 



2.2 Choice of g(x) 



The prescription adopted for g(x) is arguably the heart of the method. The 
idea is to use the set of sample points computed so far to estimate the shape of 
the integrand. First, each point is assigned a hypercubical cell by the FiEstAS 
algorithm (see [8] and Appendix |A]) . Let the volumes of these cells be denoted 
as V{. Then, 

9(X ) = £ ;r n,(x) do) 

with Ilj(x) = 1 if x belongs to the i-th cell, and otherwise. 



The choice of weights Wi = 1 corresponds to uniform sampling, and Wi = \fi\ 
corresponds to g(x) w |/(x)|/7. This is the optimal probability distribution 
when the integrand shape is already well described by the sample points, i.e. 
/(x) ~ Y^iLi /illj(x). In practice, convergence is faster if one uses instead 



1 L np.i a — i 



where the sum is performed over the n ne j neighbouring cells in the FiEstAS 
tessellation, including the cell i itself. This prescription tends to sample regions 
where |/| is large and/or varies on small scales (comparable to the sampling 
density), and therefore it combines in a simple way the advantages of impor- 
tance and stratified sampling. Considering the values of / in adjacent cells is 
important in order to efficiently explore those regions where large gradients 
are found, such as newly discovered maxima or pronounced degeneracies. 



2.3 Sampling from g(x) 



Each step, tif = (l — r] u )n new points are generated from the distribution (TTOj) . 
The probability that each of these points belongs to cell % is given by 

WiVi 

Pi = 9iVi = —jf (12) 

and therefore the cumulative distribution can be computed as 

P, = ± Pl = ^l±^ (13) 



3=1 

with Pn = 0. 



Pn 



Sampling from P; is then trivial. A uniform random number between and 
1 is generated, and the value of i is obtained from a binary search. The new 
point is created at a random location, uniformly distributed within cell i. 



2.4 Evaluation of the result 



From the np = (1 — r] u )n new points generated during the current step, 



1 m? f 

n F i=l 9i 



6 



and 




*=;*££-'■'■ (15) 



To estimate the value of J, the estimates of the last S steps are combined 
according to 

1 s -~ l - 

i = ~ E J.-* ( 16 ) 

15 i=0 



with an associated error 

A 2 



1 ^V ^2 



5(5- 



- 1 / i=0 



The number of steps S is set by the condition 



ls-S 



-I 



>3min{A s ,A 5 } 



if 5 > 4. Else, the procedure is assumed not to have converged yet, and only 
the information from the last step is considered (i.e. I — I s and A = A s ). 



2. 5 Iteration 



At the end of each step, the total number of new points n is increased by a 
fixed fraction r\ n . Since the FiEstAS tree has to be updated once per step 
in order to estimate g(x), a small value of r] n increases the overhead of the 
algorithm. However, a frequent update results in a more accurate sampling, 
and therefore a smaller number of function evaluations. A compromise should 
be sought for each particular problem, taking into account the computational 
cost of evaluating the integrand and setting i] n so as to minimize the total 
CPU time. Otherwise, the precise value of this parameter does not significantly 
affect the results. 



2. 6 Convergence 



The main loop continues generating new points until the estimated relative 
error drops below the value of the tolerance parameter e, 

A<e|7|. (19) 

Again, the default value should be interpreted as a guideline, subject to trial 
and error, common sense, and the specific requirements of the problem at 
hand. Note that, as mentioned above, the condition S > 4 is also imposed 



in order to ensure that the algorithm has actually converged. This criterion 
is only important in those few cases where a new maximum has just been 
discovered at the time A decreased below the requested tolerance. 



3 Results 



The performance of FiEstAS sampling has been tested by applying the al- 
gorithm to a series of toy multidimensional problems with known analytical 
solution. The integrands have been chosen to emphasize the issues of mul- 
timodality and the presence of non-linear degeneracies. I also consider the 
test suite proposed by Genz [9] and compare the results with those of other 
algorithms. 

Two different aspects have been considered: the accuracy of the solution (the 
actual value of the integral J, the quality of the error estimate A, and the 
ability to find all maxima) and the computational cost (in terms of the number 
of evaluations of the function / as well as the overhead in CPU time incurred 
by the FiEstAS tree). 

All the tests have been carried out on a Pentium 4 processor with a clock rate 
of 3.2 GHz. The parameters of the algorithm are always set to their default 
values. 



3.1 Toy problem I 



As a first example, the method is used to compute the integral of a linear 
combination of five multivariate Gaussians with different values of a, located at 
arbitrary points in the xy-plane. The locations and dispersions are the same as 
in [5], i.e. x { = {-0.4, -0.35, -0.2, 0.1, 0.45}, y t = {-0.4, 0.2, 0.15, -0.15, 0.1}, 
and o~i = {0.01,0.01,0.02,0.03,0.05}, but each of the Gaussian components 
has been normalized in order to facilitate the counting of the number of peaks 
detected, 

5 1 

/( X ) = E f^^D/2 eX P 



I 12 

X — Cif 



f 1 (27Taf) D /^' r ■ ' (20) 

where D is the dimensionality of the problem and Ci is the centre of each 
Gaussian, given by X{ and yi. 

The correct value of the integral (performed from —1 to 1 along all axes) is 
thus I ~ 5. In the Bayesian framework, each of the maxima would correspond 
to a different solution. Since all the individual evidences are equal, these five 
solutions would be equally valid, and none of them would be preferred over 



I 



A 



iV 



*CPU 



3.982 0.037 3974 0.31 

3.979 0.026 3603 0.27 

5.095 0.047 3603 0.27 

4.026 0.024 1990 0.14 

3.958 0.032 3603 0.27 

4.035 0.040 3603 0.27 

4.032 0.039 2682 0.19 

5.052 0.044 5325 0.42 

4.144 0.039 3603 0.27 

4.998 0.048 7127 0.57 

Table 1 

Results obtained by 10 independent runs of FiEstAS sampling for the toy prob- 
lem I in two dimensions. First and second columns quote the estimates given by 
equations ([TBI) and (fT7|) . respectively, followed by the total number of evaluations 
and the CPU time (in seconds) required by the algorithm. 




Fig. 1. Sample points used by the last run in Table Q] 

the others. From a frequentist point of view, the narrowest peaks should be 
favoured because they yield the highest values of the likelihood /. 

The results of ten independent runs with different random seeds are listed 
in Table [1] for the two-dimensional case. In a larger set of one hundred runs, 
the number of peaks identified by the algorithm was three (4 times), four 



D 


(i)± n 


(A/Atrue) ± ^A 


(N) ± a N 


(*cpu) ± a t 


2 


4.490 ± 0.485 


0.767 ± 1.836 


4400 ± 1397 


0.34 ±0.12 


3 


3.910 ±0.309 


2.128 ±3.254 


18896 ± 6003 


2.87 ±1.00 


4 


4.161 ±0.325 


0.722 ± 3.254 


86084 ± 40075 


24.64 ± 12.07 


5 


3.525 ± 1.362 


1.441 ± 2.796 


213956 ± 115194 


104.41 ± 59.64 


6 


1.790 ± 1.577 


0.796 ±2.186 


482739 ± 545036 


380.49 ± 487.6 


7 


2.001 ± 1.615 


1.343 ± 2.383 


2937754 ± 3908586 


4207.78 ± 6332 


8 


1.501 ± 1.094 


0.421 ±4.539 


6578407 ± 5835573 


13554.28 ± 15750 



Table 2 

Results for toy problem I in D dimensions, averaged over ten independent runs. 
Columns show D, the value of I, the deviation of the estimated error A with respect 
to Atrue (see text), the number of evaluations, and the CPU time in seconds. 

(56 times), and five (40 times). The narrow Gaussian at (—0.4, —0.4) was the 
most difficult to detect. Although it has the same dispersion as the second 
component at (—0.35,0.2), the latter is relatively close to another broader 
maximum, and thus it was much easier to identify. Neglecting the contribution 
from missed peaks, the estimated error always corresponded fairly well to the 
actual deviation with respect to the analytical solution. The number of sample 
points required to achieve convergence ranged from N = 1990 to 11557, and 
the run time was in all cases of the order of a few tenths of a second. The 
sample points used by the last run are plotted in Figure [TJ 

The dependence on the number of dimensions D is shown in Table [21 The 
five Gaussians are kept at the same locations in the xy-pl&ne, with the same 
dispersions and normalized to unit evidence, but they refer now to D inde- 
pendent variables ranging from —1 to 1. Each entry on the table corresponds 
to the average and standard deviation of ten independent runs with different 
random seeds. 

FiEstAS sampling typically discovers four or five peaks for D = 2, which 
leads to the average value (i) = 4.5 ± 0.5. As the number of dimensions 
increases, detecting each of the isolated maxima becomes more difficult, and 
the value of ( I) decreases. Note, however, that large dispersions ai indicate 
that the number of peaks detected may vary considerably between different 
runs. For instance, in D = 6 dimensions, the algorithm stopped after finding 
only the broad maximum at (0.45,0.1) in eight out of the ten runs, whereas 
all peaks were correctly identified in the other two. 



The values of ( A/A tru A correspond to the geometrical average of the esti- 
mated error A divided by the 'true' error A trU e = \I — I'\, where I' is the 



10 



nearest integer to I. More precisely, 



A/A t 



cxp 




and 



cta = exp 



In 2 




(21) 



f22) 



i-r\JI \ \\i-r 

Since all of the Gaussians have unit evidence, I' corresponds to both the num- 
ber of peaks detected and the analytical value of the integral after subtract- 
ing the contribution from the undetected components. The results obtained, 
^A/A true y ~ 1 ± 3, suggest that A is typically within a factor of three (above 
or below) from A true (see e.g. Tabled]). 



Finally, the exact number of evaluations required depends on the integrand 
being considered, the requested tolerance e, and, to a lesser extent, the values 
of the parameters t] u and t] n . What can be inferred from the data in Table [2] 
is that N grows exponentially with the number D of dimensions. Since the 
complexity of the FiEstAS tree is 0(N log N) [8], the overhead in CPU time 
also grows exponentially. 



3.2 Toy problem II 



In order to assess the effect of non-linear degeneracies, let us also investigate 
the toy problem proposed by [TO] , 



with 



/(x) = i?(x; Ci, n, tui) + .R(x; c 2 , r 2 , w 2 

(|x — c\ — r) 



i?(x; c, r, w) 



exp 



2w 2 



(23) 



(24) 



The function R(x) represents a ring of radius r and width w, centered at c. 

The normalization k is chosen so that J R(x)dx = 1, and it can be expressed 

as 

Dit D / 2 

- X K D - X (r,w) (25) 



K 



r(D/2 + l) v / 2 



-KW C 



where T denotes the Gamma function and Kd is given by the recursive formula 



K D (r, w)=r K D ^ + (D - l)w 2 K D „ 



(26) 



with K = 1 and K\ = r. The two rings are separated by 7 units along 
the x-axis, and their radii and widths are set to the values r\ = 1, r 2 = 2, 
and W\ = W2 = 0.1. The domain of integration ranges from —6 to 6 in all 
dimensions. 
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Fig. 2. Sample points for toy problem II in two dimensions. 



D 



I)±aj 



A/At rU c ) ± a A 



(N) ± a N 



(tcpu) ± 0"t 



2 


1.990 ±0.023 


1.792 ±2.991 


6127 ±1012 


0.47 ±0.09 


3 


2.001 ±0.017 


1.411 ±2.137 


29660 ± 8082 


4.60 ±1.35 


4 


1.984 ±0.027 


0.938 ± 2.700 


72619 ± 13452 


20.03 ±4.04 


5 


1.897 ±0.304 


1.199 ±2.473 


179333 ± 56814 


83.86 ±29.15 


6 


1.491 ±0.501 


0.933 ±3.181 


327824 ± 77328 


249.65 ± 64.26 


7 


1.196 ±0.397 


0.782 ± 2.524 


735387 ± 203382 


552.04 ± 169.9 


8 


1.381 ±0.480 


0.675 ± 2.206 


4517358 ± 3616613 


11001.25 ± 9788 


Table 3 
Results for toy problem II. 







The values of the integral computed by FiEstAS sampling are given in Ta- 
ble El and Figure [2] shows the sample points used by the algorithm in the 
two-dimensional case. The two rings have been correctly identified by the al- 
gorithm ten times out of ten up to D — 4. As the number of dimensions 
increases, the small ring occupies an exponentially smaller fraction of the in- 
tegration domain with respect to the bigger structure, and the detection rate 
decreases. For D = 8, it is found in four of the ten runs. 

The shape of the rings is always accurately recovered, which proves that the 
presence of strong degeneracies does not significantly affect the accuracy of the 
method. As in the previous examples, the estimated error is within a factor 
of three from the true value, and both the number N of evaluations and the 
overhead tcpu grow exponentially with D. 
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Model 


D 


™ Vegas 


-WSuave 


•* * Divonne 


-Vcuhre 


^FiEstAS 


I 


2 


52000 


90000 


18569 


9165 


3974 




3 


45000 


160000 


193910 


82677 


14012 




4 


175000 


320010 


285312 


215271 


168641 




5 


314500 


1270259 


86567 


1150695 


204091 




6 


- 


120036 


171526 


3971451 


224517 




7 


- 


280084 


236217 


26245785 


11186912 




8 


- 


1840397 


6619086 


116242685 


7640740 


II 


2 


45000 


60000 


6321 


19175 


5325 




3 


67500 


90000 


24700 


213233 


33236 




4 


85000 


230006 


99300 


8675253 


58999 




5 


137500 


320015 


189236 


96301023 


328797 




6 


- 


510046 


219956 


- 


397883 




7 


- 


820128 


364261 


- 


640912 




8 


- 


1120153 


122915 


- 


1828970 



Table 4 

Number of evaluations required by the algorithms Vegas [6], Suave [7J, Divonne 
[11] . and Cuhre [12] for toy problems I and II in D dimensions. Empty entries 
indicate failure to converge within N = 10 9 function calls. Results of a single run 
of FiEstAS sampling are quoted in the last column. 

3.3 Comparison with other algorithms 



The method presented here significantly outperforms the standard Monte 
Carlo with uniform probability, and its results are comparable to those of 
the bank sampling technique described in (10J . at least for problems of low 
dimensionality. An important advantage of FiEstAS sampling is that prior 
information on the shape of / is not required, although it could be easily 
taken into account by simply adding the 'bank' points at any stage. On the 
other hand, comparison with the results of [5j suggests that clustered nested 
sampling may be more efficient for large D. 

Tables 0] and [5] show the performance of the algorithms Vegas [6 , Suave [7J , 
Divonne [11] , and Cuhre [12], as implemented in the Cuba library 1 1 [7J, when 
applied to toy problems I and II. Table H] quotes the number of integrand 
evaluations, and Table \5\ shows the estimated value of the integral. FiEstAS 



http://www.feynarts.de/cuba 
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Model 


D 


•* Vegas 


-'Suave 


-*Divonne 


-fCuhrc 


^FiEstAS 


I 


2 


5.01 


5.00 


4.95 


5.00 


4.06 




3 


2.29 


3.99 


5.00 


5.00 


4.09 




4 


1.03 


3.99 


3.64 


2.00 


4.56 




5 


2.11 


3.95 


2.61 


2.00 


3.96 




6 


- 


0.99 


2.01 


2.00 


1.00 




7 


- 


0.37 


2.47 


2.00 


4.99 




8 


- 


1.80 


0.58 


1.99 


3.01 


II 


2 


2.00 


2.00 


2.02 


2.00 


2.00 




3 


2.00 


2.00 


1.96 


1.63 


2.01 




4 


1.95 


2.00 


2.01 


1.54 


1.98 




5 


1.37 


2.00 


1.38 


1.60 


2.00 




6 


- 


1.26 


1.20 


- 


2.01 




7 


- 


0.50 


1.00 


- 


0.96 




8 


- 


0.30 


0.51 


- 


0.99 



Table 5 
Estimates I returned by the different algorithms (same runs as in Table SJ). 

sampling is quite competitive in terms of efficiency; the values of N are always 
comparable to or lower than those required by the other methods. Its main 
drawback, however, is the computational overhead; the CPU times in Tables [2] 
and [3] are much larger than those required by the other algorithms (a few 
minutes in the worst cases). In terms of accuracy, the number of detected 
maxima compares very favourably with the other methods. Moreover, the 
estimate I is almost always close to the true value of the integral, ignoring the 
contribution from undetected peaks; among all the examples quoted, only the 
run for toy model I in four dimensions reported a wrong value. 

In order to probe a wider variety of integrands, the different methods are also 
benchmarked with the Genz test suite [3], based on the following six function 
families: 



(1) Oscillatory: 

(2) Product peak: 



/i(x) = cos(c • x + 2ttwi) 



d 



/ 2 (x) = n 



d=\ y Xi Wi ) c i 
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(3) Corner peak: 

(4) Gaussian: 

(5) C°-continuous: 

(6) Discontinuous: 

/e(x) 



/ 3 (x) = (l + c.x)-^ +1 ) 

/ 4 (x) = exp(-|c- (x- w)| 2 ) 

/ 5 (x) = exp(-c- |x-w|) 



exp(c • x) if x\ < w\ and %2 < W2 
otherwise 



The parameter vector c sets the overall difficulty of the problem. Its compo- 
nents are chosen as uniform random numbers from to 1, and then they are 
normalized according to 

D 

||c||i = X;cd=50 (27) 

d=l 

except for the oscillatory integrand, for which I set ||c| |i = 5. The components 
of w are uniform random numbers between and 1. In general terms, they 
do not affect the difficulty of the problem, since they only control the location 
of the features (e.g. maxima) of /. They play an important role, though, for 
the oscillatory family, because w± sets the precise value of the integral, which 
can be arbitrarily close to zero. This poses a problem for FiEstAS sampling 
(and, more generally, to pure Monte Carlo algorithms) because a large number 
of evaluations is required to achieve the cancellation of positive and negative 
terms with the desired accuracy. 

Results of the Genz test in 2, 5, and 8 dimensions are quoted in Table [6] for 
the different algorithms. Each entry shows the average number of evaluations 
over 20 random realizations of each family. In order to speed up the test, 
we impose a maximum number of evaluations iV max = 10 6 . It is clear from 
the comparison that every algorithm has its own strengths and weaknesses. 
Some methods are better suited to solve a particular class of problems, and 
some others perform better in many dimensions. In general terms, the results 
presented here suggest that FiEstAS sampling could be a good choice for 
complicated integrands of low to moderate dimensionality. 



4 Conclusions 



This paper presents a new algorithm to carry out numerical integration in 
multiple dimensions or, in a Bayesian context, sample from the posterior dis- 
tribution and compute the evidence for the assumed model. The method, 
dubbed 'FiEstAS sampling', is a variant of importance sampling where the 
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Genz 


D 


(iVvegas) 


(-^Suave) 


(-^Divonnc) 


(-ZVCuhrc) 


(^FiEstAs) 


1 


2 


( 2 )97056 


66500 


2954 


195 


( 3 ) 165913 




5 


18350 


42000 


2917 


819 


W22110 




8 


68500 


51500 


3697 


3315 


W57411 


2 


2 


5625 


20500 


3186 


2190 


1096 




5 


9100 


31500 


10624 


477422 


16559 




8 


12300 


40000 


23326 


( 19 )469625 


57301 


3 


2 


4500 


13500 


1690 


195 


395 




5 


4500 


20000 


3970 


819 


3231 




8 


7000 


20000 


5495 


3315 


11505 


4 


2 


5750 


23000 


7137 


1768 


2037 




5 


16950 


42000 


30582 


57630 


32559 




8 


30200 


64000 


( 3 ) 52943 


( 10 ) 549185 


183580 


5 


2 


5375 


22500 


4542 


4407 


1595 




5 


10025 


33500 


19747 


( 17 >671853 


21697 




8 


16650 


42500 


W40161 


(20) . 


92247 



6 2 7825 32500 6994 50453 3056 
5 18025 87008 ^64257 84766 25940 
8 ^36079 103006 ( 4 > 152156 ( 12 >604435 115577 

Table 6 

Average number of evaluations required by each algorithm for the test suite pro- 
posed by Genz [9]. The small numbers in parentheses indicate the number of runs 
where the desired accuracy was not reached for N = 10 6 . 

weight of each point is computed with the help of the Field Estimator for 
Arbitrary Spaces [8]. 

Its performance has been tested for several toy problems with known ana- 
lytical solution, specifically designed to contain multimodal distributions and 
significant degeneracies. The results suggest that FiEstAS sampling provides 
an interesting alternative to other methods for problems of low dimensionality. 
In particular, it is able to discover most isolated maxima of the integrand, and 
it can perfectly sample from distributions with pronounced degeneracies. As 
the number of dimensions increases, the ability of the algorithm to identify 
peaks decreases and the required computational resources (in terms of both 
time and memory) grow exponentially. 
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Fig. A.l. FiEstAS algorithm applied to a random uniform distribution of 100 
points in two dimensions. 

A FiEstAS 



The Field Estimator for Arbitrary Spaces (FiEstAS) is a technique to esti- 
mate the continuous (probability) density field underlying a given distribution 
of data points. Particular attention is paid to avoid imposing a metric on the 
data space. Indeed, the problem may actually be regarded as computing the 
appropriate metric, given the data. 

FiEstAS assigns each point a volume v , by means of a k — d tree. The space 
is recursively divided, one dimension at a time, until there is only one single 
data point in each node. The original implementation, described in [8], is heav- 
ily oriented towards a particular problem, namely the estimation of densities 
in phase space (a non-Euclidean, six-dimensional space composed of three- 
dimensional positions and velocities). Here I use a more general version of the 
algorithm, where the dimension less likely to arise from a uniform distribu- 
tion is selected for splitting at each step (very similar to the Shannon entropy 
approach followed by [IS])- More precisely, a histogram with B = 1 + s/N no ^ e 
bins is built for each dimension, and the log-likelihood 



B 



In (A", 



node 



-iV node m(5)-£ln(n 6d ! 



(a.i; 



6=1 



is computed, where the indices 1 < d < D and 1 < b < B denote the 
dimension and the bin number, respectively, n&d is the number of points in 
each bin, and N no & e is the total number of points in the node. In order to 
encourage a similar number of divisions along all dimensions, I add to Ld the 
number of times Sd that the d-t\i axis has already been split, 



La — Ld + Sd- 



(A.2) 



The dimension with smaller L' is divided at the point x sp iit = (x\ + x r )/2, 
where x\ is the maximum x of all points lying on the 'left' side (b < & sp iit ) and 
x r is the minimum x of the points lying on the 'right' (b > b sp \ it ) side. The bin 
1 < frspiit < B is chosen in order that the number of points on each side is as 
close as possible to N no & e /2. 
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The procedure is illustrated in Figure IA.ll where successive steps of the algo- 
rithm are plotted for a random realization of 100 uniformly-distributed data 
points in two dimensions. For this particular realization, the dimension with 
smaller V turned out to be the x-axis (first panel in the figure). Then, for the 
points on the left side, the y-axis was selected for splitting (second panel). The 
same axis was chosen again for the bottom region, and the sequence continued 
by dividing further along the x-, y-, x-, and x-axes. At this moment (third 
panel in the figure) there is only one point on the left side, and control returns 
to the parent node in order to split the right branch (containing, in this case, 
two points). The final tessellation obtained by this method is shown on the 
last panel. 
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